---
id: training-data-format
sidebar_label: Training Data Format
title: Training Data Format
description: Description of the YAML format for training data
abstract: This page describes the different types of training data
  that go into a Rasa assistant and how this training data is structured.
---

## Overview

Rasa uses [YAML](https://yaml.org/spec/1.2/spec.html) as
a unified and extendable way to manage all training data,
including NLU data, stories and rules.

You can split the training data over any number of YAML files,
and each file can contain any combination of NLU data, stories, and rules.
The training data parser determines the training data type using top level keys.

The [domain](glossary.mdx#domain) uses
the same YAML format as the training data and can also be split across
multiple files or combined in one file. The domain includes
the definitions for [responses](responses.mdx) and [forms](forms.mdx).
See the [documentation for the domain](domain.mdx) for information on how to format your domain file.

:::note Legacy Formats
Looking for Markdown data format? It's removed in Rasa 3.0, but
you can still find the documentation for <a href="https://legacy-docs-v1.rasa.com/nlu/training-data-format/" target="_blank" rel="nofollow noopener noreferrer">markdown NLU data</a> and <a href="https://legacy-docs-v1.rasa.com/core/stories/" target="_blank" rel="nofollow noopener noreferrer">markdown stories</a>.
If you still have your training data in Markdown format then the recommended approach is to use Rasa 2.x
to convert your data from Markdown to YAML. [The migration guide](./migration-guide.mdx#training-data-files)
explains how to do this.
:::


### High-Level Structure

Each file can contain one or more **keys** with corresponding training
data. One file can contain multiple keys, but each key can only appear
once in a single file. The available keys are:

- `version`
- `nlu`
- `stories`
- `rules`

You should specify the `version` key in all YAML training data files.
If you don't specify a version key in your training data file, Rasa
will assume you are using the latest training data format specification supported
by the version of Rasa you have installed.
Training data files with a Rasa version greater than the version you have
installed on your machine will be skipped.
Currently, the latest training data format specification for Rasa 3.x is 3.1.

### Example

Here's a short example which keeps all training data in a single file:

```yaml-rasa
version: "3.1"

nlu:
- intent: greet
  examples: |
    - Hey
    - Hi
    - hey there [Sara](name)

- intent: faq/language
  examples: |
    - What language do you speak?
    - Do you only handle english?

stories:
- story: greet and faq
  steps:
  - intent: greet
  - action: utter_greet
  - intent: faq
  - action: utter_faq

rules:
- rule: Greet user
  steps:
  - intent: greet
  - action: utter_greet

```

To specify your test stories, you need to put them into a separate file:
```yaml-rasa title="tests/test_stories.yml"
stories:
- story: greet and ask language
- steps:
  - user: |
      hey
    intent: greet
  - action: utter_greet
  - user: |
      what language do you speak
    intent: faq/language
  - action: utter_faq
```
[Test stories](#test-stories) use the same format as the story training data and should be placed
in a separate file with the prefix `test_`.

:::note The  `|` symbol
As shown in the above examples, the `user` and `examples` keys are followed by `|`
(pipe) symbol. In YAML `|` identifies multi-line strings with preserved indentation.
This helps to keep special symbols like `"`, `'` and others still available in the
training examples.
:::

## NLU Training Data

[NLU](glossary.mdx#nlu) training data consists of example user utterances categorized by
[intent](glossary.mdx#intent). Training examples can also include [entities](glossary.mdx#entity). Entities are structured
pieces of information that can be extracted from a user's message. You can also
add extra information such as regular expressions and lookup tables to your
training data to help the model identify intents and entities correctly.

NLU training data is defined under the `nlu` key. Items that can be added under this key are:

- [Training examples](#training-examples) grouped by user intent e.g.
  optionally with annotated [entities](#entities)

```yaml-rasa
nlu:
- intent: check_balance
  examples: |
    - What's my [credit](account) balance?
    - What's the balance on my [credit card account]{"entity":"account","value":"credit"}
```

- [Synonyms](#synonyms)

```yaml-rasa
nlu:
- synonym: credit
  examples: |
    - credit card account
    - credit account
```

- [Regular expressions](#regular-expressions)

```yaml-rasa
nlu:
- regex: account_number
  examples: |
    - \d{10,12}
```

- [Lookup tables](#lookup-tables)

```yaml-rasa
nlu:
- lookup: banks
  examples: |
    - JPMC
    - Comerica
    - Bank of America
```

### Training Examples

Training examples are grouped by [intent](glossary.mdx#intent) and listed under the
`examples` key. Usually, you'll list one example per line as follows:

```yaml-rasa
nlu:
- intent: greet
  examples: |
    - hey
    - hi
    - whats up
```

However, it's also possible to use an extended format if you have a custom NLU component and need metadata for your examples:


```yaml-rasa
nlu:
- intent: greet
  examples:
  - text: |
      hi
    metadata:
      sentiment: neutral
  - text: |
      hey there!
```

The `metadata` key can contain arbitrary key-value data that is tied to an example and
accessible by the components in the NLU pipeline.
In the example above, the sentiment metadata could be used by a custom component in
the pipeline for sentiment analysis.

You can also specify this metadata at the intent level:

```yaml-rasa
nlu:
- intent: greet
  metadata:
    sentiment: neutral
  examples:
  - text: |
      hi
  - text: |
      hey there!
```

In this case, the content of the `metadata` key is passed to every intent example.


If you want to specify [retrieval intents](glossary.mdx#retrieval-intent), then your NLU examples will look as follows:
```yaml-rasa
nlu:
- intent: chitchat/ask_name
  examples: |
    - What is your name?
    - May I know your name?
    - What do people call you?
    - Do you have a name for yourself?

- intent: chitchat/ask_weather
  examples: |
    - What's the weather like today?
    - Does it look sunny outside today?
    - Oh, do you mind checking the weather for me please?
    - I like sunny days in Berlin.
```
All retrieval intents have a suffix
added to them which identifies a particular response key for your assistant. In the
above example, `ask_name` and `ask_weather` are the suffixes. The suffix is separated from
the retrieval intent name by a `/` delimiter.

:::note Special meaning of `/`
As shown in the above examples, the `/` symbol is reserved as a delimiter to separate
retrieval intents from their associated response keys. Make sure not to use it in the
name of your intents.
:::


### Entities

[Entities](glossary.mdx#entity) are structured pieces of information that can be extracted from a user's message.

Entities are annotated in training examples with the entity's name.
In addition to the entity name, you can annotate an entity with [synonyms](nlu-training-data.mdx#synonyms), [roles, or groups](nlu-training-data.mdx#entities-roles-and-groups).

In training examples, entity annotation would look like this:

```yaml-rasa
nlu:
- intent: check_balance
  examples: |
    - how much do I have on my [savings](account) account
    - how much money is in my [checking]{"entity": "account"} account
    - What's the balance on my [credit card account]{"entity":"account","value":"credit"}

```

The full possible syntax for annotating an entity is:

```text
[<entity-text>]{"entity": "<entity name>", "role": "<role name>", "group": "<group name>", "value": "<entity synonym>"}
```

The keywords `role`, `group`, and `value` are optional in this notation.
The `value` field refers to synonyms. To understand what the labels `role` and `group` are
for, see the section on [entity roles and groups](./nlu-training-data.mdx#entities-roles-and-groups).


### Synonyms

Synonyms normalize your training data by mapping an
extracted entity to a value other than the literal text extracted.
You can define synonyms using the format:

```yaml-rasa
nlu:
- synonym: credit
  examples: |
    - credit card account
    - credit account
```

You can also define synonyms in-line in your training examples by
specifying the `value` of the entity:

```yaml-rasa
nlu:
- intent: check_balance
  examples: |
    - how much do I have on my [credit card account]{"entity": "account", "value": "credit"}
    - how much do I owe on my [credit account]{"entity": "account", "value": "credit"}
```

Read more about synonyms on the [NLU Training Data page](./nlu-training-data.mdx#synonyms).

### Regular Expressions

You can use regular expressions to improve intent classification and
entity extraction using the [`RegexFeaturizer`](components.mdx#regexfeaturizer) and [`RegexEntityExtractor`](components.mdx#regexentityextractor) components.

The format for defining a regular expression is as follows:

```yaml-rasa
nlu:
- regex: account_number
  examples: |
    - \d{10,12}
```

Here `account_number` is the name of the regular expression. When used as features for the `RegexFeaturizer` the name of the regular expression does not matter. When using the `RegexEntityExtractor`, the name of the regular expression should match the name of the entity you want to extract.


Read more about when and how to use regular expressions with each component on the [NLU Training Data page](./nlu-training-data.mdx#regular-expressions).



### Lookup Tables

Lookup tables are lists of words used to generate
case-insensitive regular expression patterns. The format is as follows:

```yaml-rasa
nlu:
- lookup: banks
  examples: |
    - JPMC
    - Bank of America
```

When you supply a lookup table in your training data, the contents of that table
are combined into one large regular expression. This regex is used to check
each training example to see if it contains matches for entries in the
lookup table.

Lookup table regexes are processed identically to the regular
expressions directly specified in the training data and can be used
either with the [RegexFeaturizer](components.mdx#regexfeaturizer)
or with the [RegexEntityExtractor](components.mdx#regexentityextractor).
The name of the lookup table is subject to the same constraints as the
name of a regex feature.

Read more about using lookup tables on the [NLU Training Data page](./nlu-training-data.mdx#lookup-tables).

## Conversation Training Data

Stories and rules are both representations of conversations between a user
and a conversational assistant. They are used to train the dialogue management
model. [Stories](stories.mdx) are used to train a machine learning model
to identify patterns in conversations and generalize to unseen conversation paths.
[Rules](rules.mdx) describe small pieces of conversations that should always
follow the same path and are used to train the
[RulePolicy](policies.mdx#rule-policy).


### Stories

Stories are composed of:

  - `story`: The story's name. The name is arbitrary and not used in training;
    you can use it as a human-readable reference for the story.
  - `metadata`: arbitrary and optional, not used in training,
    you can use it to store relevant information about the story
    like e.g. the author
  - a list of `steps`: The user messages and actions that make up the story

For example:

```yaml-rasa
stories:
- story: Greet the user
  metadata:
    author: Somebody
    key: value
  steps:
  # list of steps
  - intent: greet
  - action: utter_greet
```

Each step can be one of the following:

  - A [user message](#user-messages), represented by **intent** and **entities**.
  - An [or statement](#or-statement), which includes two or more user messages under it.
  - A bot [action](#actions).
  - A [form](#forms).
  - A [slot was set](#slots) event.
  - A [checkpoint](#checkpoints), which connects the story to another story.


#### User Messages

All user messages are specified with the `intent:`
key and an optional `entities:` key.

While writing stories, you do not have to deal with the specific
contents of the messages that the users send. Instead, you can take
advantage of the output from the NLU pipeline, which uses
a combination of an intent and entities to refer to all possible
messages the users can send with the same meaning.

User messages follow the format:

```yaml-rasa {4-6}
stories:
- story: user message structure
  steps:
    - intent: intent_name  # Required
      entities:  # Optional
      - entity_name: entity_value
    - action: action_name
```

For example, to represent the sentence
`I want to check my credit balance`, where `credit` is an entity:

```yaml-rasa {4-6}
stories:
- story: story with entities
  steps:
  - intent: account_balance
    entities:
    - account_type: credit
  - action: action_credit_account_balance
```

It is important to include the entities here as well because the
policies learn to predict the next action based on a *combination* of
both the intent and entities (you can, however, change this behavior
using the [`use_entities`](#entities) attribute).


#### Actions

All actions executed by the bot are specified with the `action:` key followed
by the name of the action.
While writing stories, you will encounter two types of actions:


1. [Responses](domain.mdx#responses): start with `utter_` and
   send a specific message to the user. e.g.

```yaml-rasa {5}
stories:
- story: story with a response
  steps:
  - intent: greet
  - action: utter_greet
```

2. [Custom actions](custom-actions.mdx): start with `action_`, run
   arbitrary code and send any number of messages (or none).

```yaml-rasa {5}
stories:
- story: story with a custom action
  steps:
  - intent: feedback
  - action: action_store_feedback
```

#### Forms


A [form](glossary.mdx#form) is a specific kind of custom action that contains the logic to loop over
a set of required slots and ask the user for this information. You
[define a form](forms.mdx#defining-a-form) in the `forms` section in your domain.
Once defined, you should specify the [happy path](glossary.mdx#happy--unhappy-paths)
for a form as a [rule](forms.mdx). You should include interruptions of forms or
other "unhappy paths" in stories so that the model can
generalize to unseen conversation sequences.
As a step in a story, a form takes the following format:


```yaml-rasa
stories:
- story: story with a form
  steps:
  - intent: find_restaurant
  - action: restaurant_form                # Activate the form
  - active_loop: restaurant_form           # This form is currently active
  - active_loop: null                      # Form complete, no form is active
  - action: utter_restaurant_found
```


The `action` step activates the form and begins looping over the required slots. The `active_loop: restaurant_form`
step indicates that there is a currently active form. Much like a `slot_was_set` step,
a `form` step doesn't **set** a form to active but indicates that it should already be activated.
In the same way, the `active_loop: null` step indicates that no form should be active before the subsequent
steps are taken.

A form can be interrupted and remain active; in this case the interruption should come after the
`action: <form to activate>` step and be followed by the `active_loop: <active form>` step.
An interruption of a form could look like this:

```yaml-rasa
stories:
- story: interrupted food
  steps:
    - intent: request_restaurant
    - action: restaurant_form
    - intent: chitchat
    - action: utter_chitchat
    - active_loop: restaurant_form
    - active_loop: null
    - action: utter_slots_values
```


#### Slots

A slot event is specified under the key `slot_was_set:` with the
slot name and optionally the slot's value.

**[Slots](domain.mdx#slots)** act as the bots memory.
Slots are **set** by either the default action [`action_extract_slots`](./default-actions.mdx#action_extract_slots) according to the
[slot mappings](./domain.mdx#slot-mappings) specified in the domain, or by custom actions.
They are **referenced** by stories in `slot_was_set` steps. For example:

```yaml-rasa {5-6}
stories:
- story: story with a slot
  steps:
  - intent: celebrate_bot
  - slot_was_set:
    - feedback_value: positive
  - action: utter_yay
```

This means the story requires that the current value for the `feedback_value`
slot be `positive` for the conversation to continue as specified.

Whether or not you need to include the slot's value depends on the
[slot type](domain.mdx#slot-types) and whether the value can or should
influence the dialogue. If the value doesn't matter, as is the case for e.g. `text` slots,
you can list only the slot's name:

```yaml-rasa {5-6}
stories:
- story: story with a slot
  steps:
  - intent: greet
  - slot_was_set:
    - name
  - action: utter_greet_user_by_name
```

The initial value for any slot by default is `null`, and you can use it to check if the slot was not set:

```yaml-rasa {5-6}
stories:
- story: French cuisine
  steps:
  - intent: inform
  - slot_was_set:
    - cuisine: null
```

:::note How slots work
Stories do not **set** slots. The slot must be set by the default action `action_extract_slots` if a slot mapping applies, or custom
action **before** the `slot_was_set` step.
:::


#### Checkpoints

Checkpoints are specified with the `checkpoint:` key, either at the beginning
or the end of a story.


Checkpoints are ways to connect stories together. They can be either the first
or the last step in a story. If they are the last step in a story, that story
will be connected to each other story that starts with the checkpoint of the
same name when the model is trained. Here is an example of a story that ends
with a checkpoint, and one that starts with the same checkpoint:

```yaml-rasa
stories:
- story: story_with_a_checkpoint_1
  steps:
  - intent: greet
  - action: utter_greet
  - checkpoint: greet_checkpoint

- story: story_with_a_checkpoint_2
  steps:
  - checkpoint: greet_checkpoint
  - intent: book_flight
  - action: action_book_flight
```

Checkpoints at the beginning of stories can also be conditional on
slots being set, for example:

```yaml-rasa {6-8}
stories:
- story: story_with_a_conditional_checkpoint
  steps:
  - checkpoint: greet_checkpoint
    # This checkpoint should only apply if slots are set to the specified value
    slot_was_set:
    - context_scenario: holiday
    - holiday_name: thanksgiving
  - intent: greet
  - action: utter_greet_thanksgiving
```


Checkpoints can help simplify your training data and reduce redundancy in it,
but **do not overuse them**. Using lots of checkpoints can quickly make your
stories hard to understand. It makes sense to use them if a sequence of steps
is repeated often in different stories, but stories without checkpoints
are easier to read and write.

#### OR statement

`or` steps are ways to handle multiple intents or slot events the same way,
without writing a separate story for each intent. For example, if you ask the user to
confirm something, you might want to treat the `affirm` and `thankyou` intents in the
same way. Stories with `or` steps will be converted into multiple
separate stories at training time.
For example, the following story would be converted to two stories at training time:

```yaml-rasa {6-8}
stories:
- story: story with OR
  steps:
  - intent: signup_newsletter
  - action: utter_ask_confirm
  - or:
    - intent: affirm
    - intent: thanks
  - action: action_signup_newsletter
```

You can also use `or` statements with slot events.
The following means the story requires that the current value for
the `name` slot is set and is either `joe` or `bob`. This story
would be converted to two stories at training time.

```yaml-rasa {6-8}
stories:
- story:
  steps:
  - intent: greet
  - action: utter_greet
  - intent: tell_name
  - or:
    - slot_was_set:
        - name: joe
    - slot_was_set:
        - name: bob
  # ... next actions
```

Just like checkpoints, OR statements can be useful, but if you are using a lot of them,
it is probably better to restructure your domain and/or intents.

:::warning Don't overuse
Overusing these features (both checkpoints and OR statements) will slow down training.
:::

### Rules

Rules are listed under the `rules` key and look similar to stories. A rule also has a `steps`
key, which contains a list of the same steps as stories do. Rules can additionally
contain the `conversation_started` and `conditions` keys. These are used to specify conditions
under which the rule should apply.

A rule that with a condition looks like this:

```yaml-rasa
rules:
- rule: Only say `hey` when the user provided a name
  condition:
  - slot_was_set:
    - user_provided_name: true
  steps:
  - intent: greet
  - action: utter_greet
```

For more information about writing rules, see [Rules](rules.mdx#writing-a-rule).

## Test Stories

Test stories check if a message is classified correctly as well as the action predictions.

Test stories use the same format as [stories](#stories),
except that user message steps can include a `user` to specify the actual
text and entity annotations of the user message. Here's an example of a
test story:

```yaml-rasa
stories:
- story: A basic end-to-end test
  steps:
  - user: |
     hey
    intent: greet
  - action: utter_ask_howcanhelp
  - user: |
     show me [chinese]{"entity": "cuisine"} restaurants
    intent: inform
  - action: utter_ask_location
  - user: |
     in [Paris]{"entity": "location"}
    intent: inform
  - action: utter_ask_price
```

You can run the tests using the following command:
```bash
rasa test
```

If you want to know more about testing head over to
[Testing Your Assistant](testing-your-assistant.mdx).


## End-to-end Training

:::info New in 2.2
End-to-end training is an experimental feature.
We introduce experimental features to get feedback from our community, so we encourage you to try it out!
However, the functionality might be changed or removed in the future.
If you have feedback (positive or negative) please share it with us on the [Rasa Forum](https://forum.rasa.com).

:::

With [end-to-end training](stories.mdx#end-to-end-training), you do not have to deal with the specific
intents of the messages that are extracted by the NLU pipeline.
Instead, you can put the text of the user message directly in the stories,
by using `user` key.

These end-to-end user messages follow the format:

```yaml-rasa {4}
stories:
- story: user message structure
  steps:
    - user: the actual text of the user message
    - action: action_name
```

In addition, you can add entity tags that can be extracted
by the [TED Policy](./policies.mdx#ted-policy).
The syntax for entity tags is the same as in
[the NLU training data](./training-data-format.mdx#entities).
For example, the following story contains the user utterance
` I can always go for sushi`. By using the syntax from the NLU training data
`[sushi](cuisine)`, you can mark `sushi` as an entity of type `cuisine`.

```yaml-rasa {4}
stories:
- story: story with entities
  steps:
  - user: I can always go for [sushi](cuisine)
  - action: utter_suggest_cuisine
```


Similarly, you can put bot utterances directly in the stories,
by using the `bot` key followed by the text that you want your bot to say.

A story with only a bot utterance might look like this:

```yaml-rasa {7}
stories:
- story: story with an end-to-end response
  steps:
  - intent: greet
    entities:
    - name: Ivan
  - bot: Hello, a person with a name!
```

You can also have a mixed end-to-end story:

```yaml-rasa
stories:
- story: full end-to-end story
  steps:
  - intent: greet
    entities:
    - name: Ivan
  - bot: Hello, a person with a name!
  - intent: search_restaurant
  - action: utter_suggest_cuisine
  - user: I can always go for [sushi](cuisine)
  - bot: Personally, I prefer pizza, but sure let's search sushi restaurants
  - action: utter_suggest_cuisine
  - user: Have a beautiful day!
  - action: utter_goodbye
```

Rasa end-to-end training is fully integrated with standard Rasa approach.
It means that you can have mixed stories with some steps defined by actions or intents
and other steps defined directly by user messages or bot responses.
